Mining Thick Skylines over Large Databases

نویسندگان

  • Wen Jin
  • Jiawei Han
  • Martin Ester
چکیده

People recently are interested in a new operator, called skyline [3], which returns the objects that are not dominated by any other objects with regard to certain measures in a multi-dimensional space. Recent work on the skyline operator [3, 15, 8, 13, 2] focuses on efficient computation of skylines in large databases. However, such work gives users only thin skylines, i.e., single objects, which may not be desirable in some real applications. In this paper, we propose a novel concept, called thick skyline, which recommends not only skyline objects but also their nearby neighbors within ε-distance. Efficient computation methods are developed including (1) two efficient algorithms, Sampling-andPruning and Indexing-and-Estimating, to find such thick skyline with the help of statistics or indexes in large databases, and (2) a highly efficient Microcluster-based algorithm for mining thick skyline. The Microclusterbased method not only leads to substantial savings in computation but also provides a concise representation of the thick skyline in the case of high cardinalities. Our experimental performance study shows that the proposed methods are both efficient and effective.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SkyDist: Data Mining on Skyline Objects

The skyline operator is a well established database primitive which is traditionally applied in a way that only a single skyline is computed. In this paper we use multiple skylines themselves as objects for data exploration and data mining. We define a novel similarity measure for comparing different skylines, called SkyDist. SkyDist can be used for complex analysis tasks such as clustering, cl...

متن کامل

Discovering Skylines of Subgroup Sets

Many tasks in exploratory data mining aim to discover the top-k results with respect to a certain interestingness measure. Unfortunately, in practice top-k solution sets are hardly satisfactory, if only because redundancy in such results is a severe problem. To address this, a recent trend is to find diverse sets of high-quality patterns. However, a ‘perfect’ diverse top-k cannot possibly exist...

متن کامل

Semi-Skylines and Skyline-Snippets

Skyline evaluation techniques (also known as Pareto preference queries) follow a common paradigm that eliminates data elements by finding other elements in the data set that dominate them. To date already a variety of sophisticated skyline evaluation techniques are known, hence skylines are considered a well researched area. Nevertheless, in this paper we come up with interesting new aspects. O...

متن کامل

UNIVERSITÄT AUGSBURG Semi-Skylines and Skyline Snippets

Skyline evaluation techniques (also known as Pareto preference queries) follow a common paradigm that eliminates data elements by finding other elements in the data set that dominate them. To date already a variety of sophisticated skyline evaluation techniques are known, hence skylines are considered a well researched area. Nevertheless, in this paper we come up with interesting new aspects. O...

متن کامل

SkyCover: Finding Range-Constrained Approximate Skylines with Bounded Quality Guarantees

Skyline queries retrieve promising data objects that are not dominated in all the attributes of interest. However, in many cases, a user may not be interested in a skyline set computed over the entire dataset, but rather over a specified range of values for each attribute. For example, a user may look for hotels only within a specified budget and/or in a particular area in the city. This leads ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004